Convolutional neural networks (CNNs) are currently among the most widely-used neural networks available and achieve state-of-the-art performance for many problems. While originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted how CNNs, like other deep learning models, are sensitive to noise injection which can jeopardise their performance. This paper quantifies the numerical uncertainty of the floating point arithmetic inaccuracies of the inference stage of DeepGOPlus, a CNN that predicts protein function, in order to determine its numerical stability. In addition, this paper investigates the possibility to use reduced-precision floating point formats for DeepGOPlus inference to reduce memory consumption and latency. This is achieved with Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the main deliverable of the DeepGOPlus model that will be used across environments and therefore most likely be subjected to the most amount of noise. Furthermore, studies have shown that the inference stage is the part of the model which is most disposed to being scaled down in terms of reduced precision. All in all, it has been found that the numerical uncertainty of the DeepGOPlus CNN is very low at its current numerical precision format, but the model cannot currently be reduced to a lower precision that might render it more lightweight.
translated by 谷歌翻译
With the increasing use of Graph Neural Networks (GNNs) in critical real-world applications, several post hoc explanation methods have been proposed to understand their predictions. However, there has been no work in generating explanations on the fly during model training and utilizing them to improve the expressive power of the underlying GNN models. In this work, we introduce a novel explanation-directed neural message passing framework for GNNs, EXPASS (EXplainable message PASSing), which aggregates only embeddings from nodes and edges identified as important by a GNN explanation method. EXPASS can be used with any existing GNN architecture and subgraph-optimizing explainer to learn accurate graph embeddings. We theoretically show that EXPASS alleviates the oversmoothing problem in GNNs by slowing the layer wise loss of Dirichlet energy and that the embedding difference between the vanilla message passing and EXPASS framework can be upper bounded by the difference of their respective model weights. Our empirical results show that graph embeddings learned using EXPASS improve the predictive performance and alleviate the oversmoothing problems of GNNs, opening up new frontiers in graph machine learning to develop explanation-based training frameworks.
translated by 谷歌翻译
Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-efficient and parameter-efficient method for adapting large pretrained vision-language models to multiple downstream tasks. However, existing approaches usually consider learning prompt vectors for each task independently from scratch, thereby failing to exploit the rich shareable knowledge across different vision-language tasks. In this paper, we propose multitask vision-language prompt tuning (MVLPT), which incorporates cross-task knowledge into prompt tuning for vision-language models. Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning. We benchmark the proposed MVLPT using three representative prompt tuning methods, namely text prompt tuning, visual prompt tuning, and the unified vision-language prompt tuning. Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods, setting the new state-of-the-art on the few-shot ELEVATER benchmarks and cross-task generalization benchmarks. To understand where the cross-task knowledge is most effective, we also conduct a large-scale study on task transferability with 20 vision tasks in 400 combinations for each prompt tuning method. It shows that the most performant MVLPT for each prompt tuning method prefers different task combinations and many tasks can benefit each other, depending on their visual similarity and label similarity. Code is available at https://github.com/sIncerass/MVLPT.
translated by 谷歌翻译
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh, on which the tensor may be distributed with the same or different layouts. We formalize this as a many-to-many multicast communication problem, and show that existing approaches either are sub-optimal or do not generalize to different network topologies or tensor layouts, which result from different model architectures and parallelism strategies. We then propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule. On microbenchmarks, our overall system outperforms existing ones by up to 10x across various tensor and mesh layouts. On end-to-end training of two large models, GPT-3 and U-Transformer, we improve throughput by 10% and 50%, respectively.
translated by 谷歌翻译
合并个人喜好对于高级机器翻译任务至关重要。尽管机器翻译最近进步,但正确反映个人风格仍然是一项艰巨的任务。在本文中,我们引入了一个个性化的自动后编辑框架来应对这一挑战,该挑战有效地产生了考虑不同个人行为的句子。为了构建此框架,我们首先收集后编辑数据,该数据表示来自Live Machine Translation系统的用户偏好。具体而言,现实世界的用户输入源句子进行翻译,并根据用户的首选样式编辑机器翻译的输出。然后,我们提出了一个模型,该模型结合了APE框架上的歧视器模块和特定于用户的参数。实验结果表明,该方法的表现优于四个不同指标(即BLEU,TER,YISI-1和人类评估)的其他基线模型。
translated by 谷歌翻译
大多数经典的大满贯系统都依赖于静态场景假设,这限制了其在现实世界中的适用性。最近提出了最近的SLAM框架来同时跟踪相机和移动对象。但是,他们通常无法估计物体的规范姿势并表现出低对象跟踪精度。为了解决这个问题,我们提出了Twistslam ++,这是一种语义,动态的,全动态的,可融合立体声图像和LiDAR信息。使用语义信息,我们跟踪可能移动对象,并将它们与LIDAR扫描中的3D对象检测相关联,以获得其姿势和尺寸。然后,我们对连续对象扫描进行注册以完善对象姿势估计。最后,使用对象扫描来估计对象的形状,并约束MAP点位于BA内的估计表面上。我们在经典的基准上表明,基于多模式信息的这种融合方法提高了对象跟踪的准确性。
translated by 谷歌翻译
从随机实验获得的数据培训模型是做出良好决策的理想选择。但是,随机实验通常是耗时的,昂贵的,冒险的,不可行的或不道德的,决策者别无选择,只能依靠培训模型时在历史策略下收集的观察数据。这不仅为实践中的决策政策发挥了最佳作用,还为不同的数据收集协议对数据培训的各种政策的绩效的影响,或者在问题上的稳健性方面的稳健性,对问题的绩效提出了疑问诸如观察结果中的动作或奖励 - 特定延迟之类的特征。我们的目的是为了在LinkedIn优化销售渠道分配的问题回答此类问题,其中销售帐户(线索)需要分配给三个渠道之一,目的是在一段时间内最大程度地提高成功转换的数量。关键问题特征构成了观察分配结果的随机延迟,其分布既是通道和结果依赖性的。我们构建了一个离散的时间模拟,可以处理我们的问题功能并将其用于评估:a)基于历史规则的策略; b)有监督的机器学习政策(XGBOOST); c)多臂强盗(MAB)策略,在涉及的不同情况下:i)用于培训的数据收集(观察性与随机分组); ii)铅转换方案; iii)延迟分布。我们的仿真结果表明,Linucb是一种简单的mAB策略,始终优于其他策略,相对于基于规则的策略,实现了18-47%的提升
translated by 谷歌翻译
ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
自我监督的表示学习的最新趋势集中在消除训练管道中的归纳偏见。但是,当可用数据有限时,归纳偏差在设置中可能很有用,或者提供对基础数据分布的更多见解。我们提出了空间注意(SPAN),该框架利用未标记的图像数据集中使用一致的空间和语义结构来指导视觉变压器的注意。 SPAN通过将注意力面罩从单独的变压器头正规化,以跟随语义区域的各个先验。这些先验可以从数据统计数据或域专家提供的单个标记样本中得出。我们研究了几种详细的现实情况,包括医学图像分析和视觉质量保证。我们发现,所产生的注意力面膜比从域 - 不合义预审进的掩码更容易解​​释。 SPAN可为肺和心脏分割产生58.7的地图改进。我们还发现,与结构域 - 不合稳定的预处理相比,我们的方法在将验证的模型转移到下游胸部疾病分类任务时会产生2.2个MAUC的改善。最后,我们表明,与域 - 不可屈服的预处理相比,跨越预处理会导致低数据表格中的下游分类性能更高。
translated by 谷歌翻译
Hololens(Microsoft Corp.,WA Redmond,WA)是一款头饰,光学透明的增强现实展示,是最近提高医学增强现实研究的主要参与者。在医疗环境中,HoloLens使医生能够立即了解患者信息,直接与他们对临床方案的看法,医学生,可以更好地了解复杂的解剖学或程序,甚至可以通过执行治疗任务。改进,沉浸式指导。在这篇系统的综述中,我们提供了有关医疗领域第一代霍洛伦斯在2016年3月发布到2021年的全面使用的全面概述,一直关注其继任者霍洛伦斯2号。通过系统搜索PubMed和Scopus数据库确定了171个相关出版物。我们分析了这些出版物的预期用例,注册和跟踪的技术方法,数据源,可视化以及验证和评估。我们发现,尽管已经显示出在各种医学场景中使用Hololens的可行性,但在精确,可靠性,可用性,工作流程和感知方面的努力增加了在临床实践中建立AR。
translated by 谷歌翻译